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Abstract 

We consider the channel access problem under imperfect sensing of channel state in a multi-channel 
opportunistic communication system, where the state of each channel evolves as an independent and 
identically distributed Markov process. The considered problem can be cast into a restless multi-armed 
bandit (RMAB) problem that is of fundamental importance in decision theory. It is well-known that 
solving the RMAB problem is PSPACE-hard, with the optimal policy usually intractable due to the 
exponential computation complexity. A natural alternative is to consider the easily implementable myopic 
policy that maximizes the immediate reward but ignores the impact of the current strategy on the future 
reward. In this paper, we perform an analytical study on the optimality of the myopic policy under 
imperfect sensing for the considered RMAB problem. Specifically, for a family of generic and practically 
important utility functions, we establish the closed-form conditions under which the myopic policy is 
guaranteed to be optimal even under imperfect sensing. Despite our focus on the opportunistic channel 
access, the obtained results are generic in nature and are widely applicable in a wide range of engineering 
domains. 
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I. Introduction 

We consider an opportunistic multi-channel communication system in which a user has access to 
multiple channels, but is limited to sense and transmit only on a subset of them at a time. The fundamental 
problem we study is how the sender can exploit past observations and the knowledge of the stochastic 
properties of the channels to maximize its utility (e.g., expected throughput) by switching opportunistically 
across channels. 

Formally, the considered channel access problem can be cast into the restless multi-armed bandit 
(RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) 
problem, which is of fundamental importance in stochastic decision theory. The standard formulation of 
the RMAB problem can be briefly summarized as follows: There is a bandit of N independent arms, 
each evolving as a two-state Markov process. At each time slot, a player chooses k (1 < k < N) of 
the N arms to play and receives a certain amount of reward depending on the state of the played arms. 
Given the initial state of the system, the goal of the player is to find the optimal policy of playing the k 
arms at each slot so as to maximize the aggregated discounted long-term reward. 

Despite the significant research efforts in the field, the RMAB problem in its generic form still remains 
open. Until today, very little result is reported on the structure of the optimal policy. Obtaining the optimal 
policy for a general RMAB problem is often intractable due to the exponential computation complexity. 
Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward. Due 
to its simple and robust structure, the myopic sensing policy has begun to attract significant research 
attention, especially on the optimality of the myopic sensing policy. 

The vast majority of studies in the area assume perfect observation of channel states. However, sensing 
or observation errors are inevitable in practical scenario (e.g., due to noise and system limitations), 
especially in wireless communication systems which is the focus of our work. More specifically, a good 
(bad, respectively) channel may be sensed as bad (good) and accessing a bad channel leads to zero 
reward. In such context, it is crucial to study the structure and the optimality of the myopic sensing 
policy with imperfect observation. We would like to emphasize that the presence of sensing error brings 
two difficulties when studying the myopic sensing policy in this new context. 

• The channel state evolves as a non-linear mapping (w.r.t. the current channel state) instead of a 
linear one in the perfect sensing case. 

• In the non-perfect sensing case, the state transition of a channel depends not only on the channel 
evolution itself, but also on the observation outcome, meaning that the transition is not deterministic. 
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Due to the above particularities^ our problem requires an original study on the optimality of the 
myopic sensing policy that cannot draw on existing results in the perfect sensing case. We would like 
to report that despite its practical importance and particularities, very few work has been done on the 
impact of sensing error on the performance of the myopic sensing policy, or more generically, on the 
RMAB problem under imperfect observation. To the best of our knowledge, CQ is the only work in this 
area, where the optimality of the myopic policy is proved for the case of two channels with a particular 
utility function. In this paper, we derive closed-form conditions under which the myopic sensing policy is 
optimal under imperfect sensing for arbitrary N and generic utility functions. As shown in Section ITlI-Cl 
the result obtained in this paper can cover the result of [Qj. Moreover, this paper also significantly extends 
our previous work [j2], focusing on perfect sensing scenario in which the analysis cannot be applied in the 
imperfect sensing scenario due to the non-trivial particularities introduced by sensing error as mentioned 
previously. In this regard, our work in this paper contributes the existing literature by developing an 
adapted analysis on the RMAB problem under imperfect sensing under the generic framework proposed 
in 0. 

The rest of the paper is organized as follows: Our model is formulated in Section|IIl Section ITO1 studies 
the optimality of the myopic sensing policy and illustrates the application of the derived results via two 
typical examples. A detailed discussion on the related work is given in Section [IV] Finally, the paper is 
concluded by Section [V] 

II. Problem Formulation 

A. Multi-channel Opportunistic Access with Imperfect Sensing 

As outlined in the Introduction, we consider a multi-channel opportunistic communication system, 
in which a user is able to access a set N of N independent and statistically identical channels, each 
characterized by a Markov chain of two states, good/idle (1) and bad/busy (0). The state transmission 
probabilities are given by {pi,j},i,j = 0, 1. We assume that the system operates in a synchronously time 
slotted fashion with the time slot indexed by t (t = 1, 2, ■ ■ ■ , T), where T is the time horizon of interest. 
Each channel goes through state transition at the beginning of each slot t. This generic multi-channel 
opportunistic communication model can be naturally cast into the opportunistic spectrum access (OSA) 
problem in cognitive radio systems where an unlicensed secondary user can opportunistically access the 

'Please refer to the remark of (Q] for a detailed analysis 
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temporarily unused channels of the licensed primary users, with the availability of each channel evolving 
as an independent Markov chain. 

Limited by hardware constraints and energy cost, the user is allowed to sense only k (1 < k < N) 
of the N channels at each slot t. We denote the set of channels chosen by the user at slot t by A(t) 
where A{t) 6 M and \A(t)\ = k. We assume that the user makes the channel selection decision at the 
beginning of each slot after the channel state transition. Moreover, we are interested in the imperfect 
sensing scenario where channel sensing is subject to errors, i.e., a good channel may be sensed as bad 
one and vice versa. Let S(t) = [S'i(i), • • • , Sjv(£)] denote the channel state vector where Si(t) € {0, 1} 
is the state of channel i in slot t and let S'(i) = {S'^t)^ € A(t)} denote the sensing outcome vector 
where S'-(t) = (1) means that the channel i is sensed bad (good) in slot t. Using such notation, the 
performance of channel state detection is characterized by two system parameters: the probability of false 
alarm ei(t) and the probability of miss detection Si(t), formally defined as follows: 

e i (t)±Pr{S' i (t) = l\S i (t) = 0}, 

8 i (t)±Pr{S' i (t) = 0\S i (t) = l}. 

In our analysis, we consider the case where €i(t) and <5j(t) are independent w.r.t. t and i. More specifically, 
we defined e and 5 as the system-wide false alarm rate and miss detection rate. We also assume that when 
the receiver successfully receives a packet from a channel, it sends an acknowledgement to the transmitter 
over the same channel at the end of the slot. The absence of an ACK signifies that the transmitter does 
not transmit over this channel or transmitted but the channel is busy in this slot. 

Obviously, by sensing only k out of iV channels, the user cannot observe the state information of 
the whole system. Hence, the user has to infer the channel states from its past decision and observation 
history so as to make its future decision. To this end, we define the channel state belief vector (hereinafter 
referred to as belief vector for briefness) Q(t) = {coi(t),i € AT}, where < 0Ji{t) < 1 is the conditional 
probability that channel i is in state good (i.e., Si{t) = 1) at slot t given all past states, actions and 
observations^. Due to the Markovian nature of the channel model, the belief vector can be updated 
recursively using Bayes Rule as shown in CD- 

pu, i £ A(t), ACK = 1 

+ = { T (<p(u)i(t))), i G A(t),ACK = , (1) 

T(wi(t)), i ? A(t) 

2 The initial belief 0Ji(l) can be set to if no information about the initial system state is available. 
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where ACK = 1 denotes the case where an ACK is received (successful transmission, i.e., S'^i) = 1 
and Si(t) = 1) and ACK = denotes the case where no ACK is received (failed transmission or no 
transmission, i.e., S'^i) = 1 Si(t) = or S'(t) = 0), <p(u)i) = ^m^^jTm and 

r(uji(t)) = uJi(t)pu + [1 - uJi(t)]poi (2) 

denotes the operator for the one-step belief update. 

Remark. We would like to emphasize that in contrast to the perfect sensing case |2| where 0Ji(t + 1) is 
a linear function of u)i(t) whether i in sensed or not, in the imperfect sensing case, the mapping from 
LOi(t) to Ui(t + 1) is no longer linear due to the sensing error (cf. the second line of equation (Q])). 
Moreover, the state transition of a channel depends not only on the channel evolution itself, but also on 
the observation outcome, i.e., Ui(t + 1) = p\\ for i € A(t), ACK = 1 and uii(t + 1) = r(ip(u:i(t))) 
for i £ A(t),ACK = 0. As will be shown later, these differences make the analysis for the imperfect 
sensing more complicated. 

To conclude this subsection, we state some structural properties of r(uji(t)) and tp{ui{t)) that are 
useful in the subsequent proofs. 

Lemma 1. If Vi, poi < Pu. then 

• r(uJi(t)) is monotonically increasing in uii(t); 

• Poi < r(u)i(t)) < p n , V < < 1. 

Proof: Lemma [T] follows from r(ui(t)) = (pn — Poi)wj(i) + Poi straightforwardly. ■ 
Lemma 2. If < e < (1 ~ff ll)poi x , then 

J — — pn(l-poi) 

• ip{uji(t)) increases monotonically in Ui(t) with <p(0) = and = 1; 

• ip(ui(t)) < poi, Vpoi < Ui(t) < p n . 

Proof: Noticing that <p(u)i) = t u (t)+i-uj (t) ' Lemma |2] follows straightforwardly. ■ 

B. Optimal Sensing Problem Formulation and Myopic Sensing Policy 

Given the imperfect sensing context, we are interested in the user's optimization problem to find the 
optimal sensing policy tt* that maximizes the expected total discounted reward over a finite horizon. 
Mathematically, a sensing policy ir is defined as a mapping from the belief vector fl(t) to the action 
(i.e., the set of channels to sense) A(t) in each slot t: tt : Q(t) — > A(t), \ A(t)\ = k, t = 1,2, • • ■ , T. 
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The following gives the formal definition of the optimal sensing problem: 

T 



tt* = argmaxE 



t=i 



(3) 



where R^iQit)) is the reward collected in slot t under the sensing policy it with the initial belief vector 
^(l),0</3<lis the discounting factor characterizing the feature that the future rewards are less 
valuable than the immediate reward. By treating the belief value of each channel as the state of each arm 
of a bandit, the user's optimization problem can be cast into a restless multi-armed bandit problem. 

In order to get more insight on the structure of the optimization problem formulated in © and the 
complexity to solve it, we derive the dynamic programming formulation of ([3]) as follows: 

V T (n(t)) =maxM[RJn(T))} = max ElRJUCT))], 

T A(T)CAT 
\A(T)\=k 



V t (n(t)) = max E 

\A(t)\=k 



RnMty+P 11(1 - e)^(t) 
ecA(t) i&s 



Y[ [l-(l-e)^-(t)]F m (fi(t + l)) 
jeA(t)\e 

In the above equations, Vt(ft(t)) is the value function corresponding to the maximal expected reward 
from time slot t to T (1 < t < T) with the believe vector Q(i + 1) following the evolution described 
in (Q]) given that the channels in the subset £ are sensed in state good and the channels in A(t)\£ are 
sensed in state bad. 

Theoretically, the optimal policy can be obtained by solving the above dynamic programming. Unfor- 
tunately, due to the impact of the current action on the future reward and the unaccountable space of the 
belief vector, obtaining the optimal solution directly from the above recursive equations is computationally 
prohibitive. Hence, a natural alternative is to seek simple myopic sensing policy which is easy to compute 
and implement that maximizes the expected immediate reward F(Q(t)), formally defined as follows: 

A(t) = argmax£ i6 ^ (t) F(fi(i)). (4) 
A(t)cM 

In this paper, we focus on a class of generic and practically important functions defined in Q as 
regular functions. More specifically, the expected immediate reward function F(Q(t)) studied in this 
paper are assumed to be symmetrical, monotonically non-decreasing and decomposable, defined by the 
three axioms in O. Under this condition, the myopic policy consists of choosing the k channels with the 
largest value of uj. In the following sections we focus on the structure and the optimality of the myopic 
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sensing policy under imperfect sensing. As pointed out in the remark following equations £0 and ©, the 
main technical difficulties compared with the perfect sensing case are the non-linearity of the mapping 
from Ui(t) to u)i(t + 1) and the dependency of the channel state transition on the observation outcome. 

III. Analysis on Optimality of Myopic Sensing Policy under Imperfect Sensing 

The goal of this section is to establish closed-form conditions under which the myopic sensing policy, 
despite of its simple structure, achieves the system optimum under imperfect sensing. To this end, we 
set up by defining an auxiliary function and studying the structural properties of the auxiliary function, 
which serve as a basis in the study of the optimality of the myopic sensing policy. We then establish the 
main result on the optimality followed by the illustration on how the obtained result can be applied via 
two concrete application examples. 

For the convenience of discussion, we firstly state some notations before presenting the analysis: 

• The believe vector £l(t) is sorted to [oJx(t), • • • , wjv(t)] at each slot t such that A = {1, 2, • • • , A;}!^; 

• J\f(m) = {1, • • • , m} (m < N) denotes the first m channels in AT; 

. Given £ Q M QAf, Pr(M, £) = e)wj(t) [l-(l-e)wj(t)], herein, Pr(M,£) denotes 

ie£ j£M\£ 

the expected probability that the channels in £ are sensed in the good state, while the channels in 
M. \ £ are sensed in the bad state, given that the channels in M. are sensed; 

• Pfi denotes the vector of length \£\ with each element being p\\, 

• <&(Z, m) = [r(wj(i)),/ < i < m] where the components are sorted by channel index. &(l,m) 
characterizes the updated belief values of the channels between I and m if they are not sensed; 

• Given £ C M. C M, Q M ' £ = [r(ip(uji(t))),i G M\£] where the components are sorted by channel 
index. Q-^'^ characterizes the updated belief values of the channels in M. \ £ if they are sensed in 
the bad state; Q^'^' 1 A [r(ip(uji(t))),i G M. \ £ and i < I] characterizes the updated belief values 
of the channels in M. \ £ if they are sensed in the bad state with the channel index smaller than I; 
Q ' = [r(ip(cOi(t))),i G M.\£ and i > I] characterizes the updated belief values of the channels 
in M. \ £ if they are sensed in the bad state with the channel index larger than I; 

• Let = {coj,j G A,j ^ i} and 

A ma x = max {F(l,(j-i) - F(0,u-i)}, 

i cj_ i G[0,l] fc - 1 

A min = min {F(l,u)-i) - F(0,w_i)}. 

^ w_ i G[0,l] fc - 1 

3 For presentation simplicity, by slightly abusing the notations without introducing ambiguity, we drop the time slot index t. 
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A. Definition and Properties of Auxiliary Value Function 

In this subsection, inspired by the form of the value function Vt(£l(t)) and the analysis in 0, we first 
define the auxiliary value function with imperfect sensing and then derive several fundamental properties 
of the auxiliary value function, which are crucial in the study on the optimality of the myopic sensing 
policy. 

Definition 1 (Auxiliary Value Function under Imperfect Sensing). The auxiliary value function, denoted 
as Wt(fl) (t = 1, 2, • • • ,T) is recursively defined as follows: 

W T (n(T))=F(oj 1 (T),--- ,w fc (T)); (5) 

W t (n(t)) =F(u 1 (t),--- ,w fc (t))+ 

f3 £ Pr(M(k),£)W t+1 (n £ (t + l)), (6) 

where Qg(t+ 1) = (Pf x , *(fc+l, TV), Q^^' 6 ) denotes the belief vector generated by Q(t) based on ©. 

The above recursively defined auxiliary value function gives the expected cumulated reward of the 
following sensing policy: in slot t, sense the first k channels; if a channel i is correctly sensed idle 
(Si = 1 and Si = 1), then put it on the top of the list to be sensed in next slot, otherwise drop it to the 
bottom of the list. Recall Lemma [J and Lemma |2l under the condition < e < 0-~p^)poi jf ^ b e ij e f 

>-> i-> — — pn(l-poi)' 

vector fl(t) is ordered decreasingly in slot t, the above sensing policy is the myopic sensing policy with 
Wt(fl(t)) being the total reward from slot t to T. 

In the subsequent analysis of this subsection, we prove some structural properties of the auxiliary value 
function. 

Lemma 3 (Symmetry). If the expected reward function F is regular, the correspondent auxiliary value 
function Wt(£l) is symmetrical in any two channel i,j < k for all t = 1,2, • • • , T, i.e., 

W t (u)i, ■ ■ ■ ■ ■ ■ • • • ,uj n ), Vi,j<k. (7) 

Proof: The lemma can be easily shown by backward induction noticing that (uji, ■ ■ ■ , cjj, • • ■ , cjj, ■ ■ ■ , ujn) 
and (wi, • • • , cjj, ■ ■ ■ , Ui, ■ ■ ■ ,ojn) generate the same belief vector + 1) for any £. ■ 

Lemma 4 (Decomposability). If the expected reward function F is regular, then the correspondent 
auxiliary value function Wt(f2(i)) is decomposable for all t = 1, 2, • • • , T, i.e., 
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W t {ui, ■■■ ,uji, ■ ■ ■ ,u N ) = ujiW t (uJi, ■■■ , 1, • • • ,^n)+ 

(l -Ui)Wt(u>i,--- A ■■■ ,wn), Vie AT. 

Proof: The proof is given in the appendix. 
Lemma @] can be applied one step further to prove the following corollary. 

Corollary 1. If the expected reward function F is regular, then for any l,m £ J\f it holds that 

W t (uJi,--- ■•• ,u m ,--- ,wjv)- 

W t (oJi, ■ ■ ■ ,w ra , • • • ,ijJi,-" >vn) 
= (uji - oj m ) W t (uji, ■■■ ,0, ■ ■ ■ ,uj n )~ 



W t (ui, ■■■ ,0, • • • ,uj n ) 



t = l,2,--- ,T 



Lemma 5 (Monotonicity). If the expected reward function F is regular, the correspondent auxiliary value 
function Wt(£l) is monotonously non-decreasing in uji, V/ £ A/", /.e, 

> uji => W t (ui, ■■■ ,u)' u -' ,u> N ) > W t ((jJi, ■■■ ,oj u ■ ■ ■ ,ojn)- 

Proof: The proof is given in the appendix. ■ 

B. Optimality of Myopic Sensing under Imperfect Sensing 

In this section, we study the optimality of the myopic sensing policy under imperfect sensing. We start 
by showing the following important auxiliary lemmas (Lemma [6] and Lemma [7]) and then establish the 
sufficient condition under which the optimality of the myopic sensing policy is guaranteed. 

Poi(l-pn) 



Lemma 6. Given that (I) F is regular, (2) e < p ° 1 ); Pll > , and (3) 8 < , , ; r , 

-Pn(l-Poi)' 1 A ma « Kl-eKl-PoiH — g(P11 ~ PQl) 

if Pii > > uj m > poi where I < m, then it holds that 
W t {ui, ■■■ , io u ■ ■ ■ ,w m , • • • , LO N ) > 

Wt(pJi, ■ ■ ■ ,0J m , ■■■ ,u>i, ■ ■ ■ ,uj n ), t = 1, • • • ,T. 
Lemma 7. Given that (1) F is regular, (2) e < p p 0l( i\~ Pll \ , and (3) 8 < , ^ ; : r , 

Pll(l-P 01 )' 1 ^ - A ma J(l- e )(l-P0l)+T ^ J 

[V l\ / l-(l-,)(, u -POI ) J 

i/pil > > • • • > > Poi> /or any 1 < t < T, it holds that 
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W t (u>l, ■ ■ ■ ,CJ N -i,LO N ) W t (0J N ,U)l, ■ ■ ■ ,CJ N -i) < (1 LU N )A max , 

Wt(uJl,U} 2 , ■ ■ ■ ,wn-i,un) - W t (0JN, U>2, ■ ■ • ,WJV-l,Wl) < 

, , A l-m-e){ Pll -p m )] T - t+1 

\Pll -P01 Arnax : r~, : ■ 

1-/3(1 - e)(pn -poi) 

Lemma [6] states that by swapping two elements in f2 with the former larger than the latter, the user 
does not increase the total expected reward. Lemma [7] on the other hand, gives the upper bound on the 
difference of the total reward of the two swapping operations, swapping a/jv and o/fc (k = N — 1, • • • , 1) 
and swapping oj\ and ojjv, respectively. For clarity of presentation, the detailed proofs of the two lemmas 
are deferred to the Appendix. From a technical point of view, it is insightful to compare the methodology 
in the proof with that in the analysis presented in |@] for the perfect sensing case with k = 1. The 
key point of the analysis in H lies in the coupling argument leading to Lemma 3 in H. This analysis, 
however, cannot be directly applied in the generic case with imperfect sensing due to the non-linearity 
of the belief vector update as stated in the remark after equation (Q]). Hence, we base our analysis on the 
intrinsic structure of the auxiliary value function W and investigate the different "branches" of channel 
realizations to derive the relevant bounds, which are further applied to study the optimality of the myopic 
sensing policy, as stated in the following theorem. 

Theorem 1. If poi < 0Ji(l) < Pn,l < i < N, the myopic sensing policy is optimal if the following 
conditions hold: (1) F(Sl) is regular; (2) e < p ° 1 ^~ Pll \ ; (3) (3 < r ^ ; ; r . 

Proof: It suffices to show that for t = 1, • ■ • , T, by sorting Q(t) in decreasing order such that oj\ > 
■ ■ > ujn, it holds that Wt{oJ\, ■ ■ ■ ,ojn) > WtC^iu ' ' ' ^i N )-> wh ere {^i^ • 1 1 ^i N ) is any permutation 
of (!,■■■ ,N). 

We prove the above inequality by contradiction. Assume, by contradiction, the maximum of Wt is 
achieved at (u^, ■ ■■ / (u 1 , ■ ■ ■ ,u N ), i.e., 

W t (uj ih --- ,u>i* N ) > W t (u>i,--- ,wjv). (8) 

However, run a bubble sort algorithm on (wj* , • • • , U)i* ) by repeatedly stepping through it, comparing 
each pair of adjacent element uj^ and u)i* 1 and swapping them if cjj» < uj^+i. Note that when 
the algorithm terminates, the channel belief vector are sorted decreasingly, that is to say, it becomes 
(a/i, • • • , a/jv)- By applying Lemma [6] at each swapping, we have Wt(u}q, ■ ■ ■ ,^i* N ) < Wt(oj\, ■ ■ ■ , o/jv), 
which contradicts to d8j. Theorem [TJ is thus proven. ■ 
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As noted in 0]], when the initial belief oji is set to poi _^l Pll as is often the case in practical systems, it 
can be checked that poi < < p\\ holds. Moreover, even the initial belief does not fall in [poi,Pn]> 

all the the belief values are bounded in the interval from the second slot following Lemma [TJ Hence our 
results can be extended by treating the first slot separately from the future slots. 

C. Discussion 

In this subsection, we illustrate the application of the result obtained above in two concrete scenarios 
and compare our work with the existing results. 

Consider the channel access problem in which the user is limited to sense k channels and gets one 
unit of reward if a sensed channel is in the good state, i.e., the utility function can be formulated as 
F(Qa) = (1 — e) SieA u i- Note that the optimality of the myopic sensing policy under this model is 
studied in [H for a subset of scenarios where k = 1, N = 2. We now study the generic case with 
k, N > 2. To that end, we apply Theorem Q] Notice in this example, we have A m j n = A max = 1 — e. We 
can then verify that when e < g4^4 , it holds that A — e(P11 _ Pm) > 1. Therefore, 

maxLV n foij-r- 1 _ cl _ e)(j , 11 _ P0l) J 

when the condition 1 and 2 holds, the myopic sensing policy is optimal for any (3. This result in generic 
cases significantly extends the results obtained in |J J where the optimality of the myopic policy is proved 
for the case of two channels and only conjectured for general cases. 

Next consider another scenario where the user can sense k channels but can only choose one of them 
to transmit its packets. Under this model, the user wants to maximize its expected throughput. More 
specifically, the slot utility function F = F(Qa) = 1 — n, g _4[l — (1 — e)wj], which is regular. In this 

context, we have A max = (1 — e) fe_1 j?^ 1 and A m j n = (1 — e)* -1 ^^ 1 . The third condition on for the 

fc-i 

myopic policy to be optimal becomes /3 < k _ 1 - ^ t(P11 _ P0l) — r. Particularly, when e = 0, 

fc-i 

P < p y I* can b e note d that even when there is no sensing error, the myopic policy is not 

ensured to be optimal, which confirms our findings in previous work ||5l on perfect sensing scenarios. 

IV. Related Work 

Due to its application in numerous engineering problems, the restless multi-armed bandit (RMAB) 
problem is of fundamental importance in stochastic decision theory. However, finding the optimal policy 
in the generic RMAB problem is shown to be PSPACE-hard by Papadimitriou et al. in Q. Whittle 
proposed a heuristic index policy, called Whittle index policy Q which are shown to be asymptotically 
optimal in certain limited regime under some specific constraints JS). Unfortunately, not every RMAB 
problem has a well-defined Whittle index. Moreover, computing the Whittle index can be prohibitively 
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complex. In this regard, Liu et al. studied in (9) the indexability of a class of RMAB problems relevant 
to dynamic multi-channel access applications. However, the optimality of the myopic policy based on 
Whittle index is not ensured in the general cases, especially when the arms follow non-identical Markov 
chains. 

A natural alternative, given that the RMAB problem is not tractable, is to seek simple myopic 
policies maximizing the short-term reward. In this line of research, significant research efforts have 
been devoted to studying the performance gap between the myopic policy and the optimal one and 
designing approximation algorithms and heuristic policies (cf. ifTOl , ifTTTl . lfT2lD . Specifically, a simple 
myopic policy, termed as greedy policy, is developed in IfTOl that yields a factor 2 approximation of 
the optimal policy for a subclass of scenarios referred to as Monotone bandits. Recently, the RMAB 
problem finds its application in the opportunistic channel access and has motivated the study of the 
myopic sensing policy in this context. More specifically, the structure of the myopic sensing policy is 
studied in lfl3l . The optimality of the myopic sensing policy is derived in [@] for the positively correlated 
channels when the sender is limited to choose one channel each time (i.e., k = 1). The result is further 
extended in to the case of sensing multiple channels (k > 1) channels in Q for a particular form of 
utility function modeling the fact that the user gets one unit of reward for each channel sensed good. A 
separation principle has been established in [11] which reveals the optimality of the myopic approach in 
the design of the channel state detector and the access policy. Our previous work ||2] lTT4l adopts another 
line of research by focusing a family of generic and practically important utility functions and deriving 
closed-form conditions under which the myopic sensing policy is ensured to be optimal. In the context 
of imperfect sensing, the optimality of the myopic sensing policy is proved for the case of N = 2 and 
k — 1 in iffl. Our work presented in this paper contributes the literature by deriving the closed-form 
conditions on the optimality of the myopic sensing policy with imperfect sensing in the general case. 

V. Conclusion 

In this paper, we have investigated the problem of opportunistic channel access under imperfect channel 
state sensing. We have derived closed-form conditions under which the myopic sensing policy is ensured 
to be optimal. Due to the generic RMAB formulation of the problem, the obtained results and the analysis 
methodology presented in this paper are widely applicable in a wide range of domains. 



January 26, 2013 



DRAFT 



13 

Appendix A 
Proof of Lemma @] 

We proceed the proof by backward induction. Firstly, it is easy to verify that the lemma holds for slot 

T. 

Assume that the lemma holds from slots t + 1, • • • , T, we now prove it also holds for slot t by the 
following two different cases. 

• Case 1: channel Z is not sensed in slot t, i.e. I > k + 1. Let Ai = M(k) = {1, • • • , k}, ui = and 
1, respectively, we have 

W t (ojx,--- ,0 = ,u k ) + p Yl Pr(M,S)W t+1 (nf(t + l)), 

WtiiJi,--- ,0,--- ,u n ) = ,u k ) + p J2 Pr{M,S) W t+1 (nf >0 (t + 1)), 

£C.M 

W t (wi,.-- ,u n ) = F(ui,--- ,u> k ) + /3j2 Pr (M,£)W t+ i(nf tl (t + l)), 

where 

flf(t + l) = (Pf 1 ,*(A; + l,/-l),r( W/ ),*(/ + l,iV),Q- M ' £ ), 

nf (t + i) = (Pf 1 ,*(fc + i,i-i),poi,*a + i,Jv),Q Me ), 
nf^t + i) = (Pf !,*(*; + u - + i, AO, Q M,£ )- 

To prove the lemma in this case, it is sufficient to prove 

w t+1 (nf(t + i)) = (l-wOWt+iCnfoCt + i^+wjWt+i^ (t + i)) (9) 

According to induction result, we have 

W t+ i(fif (t + 1)) =r(w,) • Wt+i(Hi, *(* + 1, 1 - 1), 1, *(Z + 1, AO, Q^ ,£ ) 

+ (1 - r( Wi )) • Wi+iCPfi, *(* + 1, Z - 1), 0, *(/ + 1, iV), Q*^) 



(10) 



Wt+i(^ (t + 1)) =poi • Wn-iCPfi, + 1, 1 - 1), 1, *(/ + 1, N), Q M ' £ ) 

+ (1 - poi) • W t+ i(Pf !, *(* + 1, Z - 1), 0, *(/ + 1, TV), Q M ' £ ) 

Wt+i + 1)) = Pn ■ Wt+ifPfx, + 1, 1 - 1), 1, *(/ + 1, TV), Q^ £ ) 

+ (1 - p n ) • Wt+iCPfi, *(* + 1, Z - 1), 0, *(Z + 1, AT), Q-^) 
Combing {Hi, CD, CGK we have ©. 
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Case 2: channel I is sensed in slot t, i.e. I < k. Let M. = Af(k) \ {/} = {1, • • • , I — 1, 1 + 1, • • ■ , fc}, 
we have according to © 

W t (n(t)) =F(wi,-- - - ,w fc ) 

+ /3(l-eVi ^ ^r(^,£)W t+ i(Pf 1 ,Pii,*(A ; + l,iV) ) Q Ml£ ' I ,Q^ 1 ) 

+ - (1 - e)w,] £ Pr(M,5)W t+ i(Pf 1 ,*(A ; + l,iV),Q M£ ' 1 ,r(^(a;0),Q M ' £ ' 1 ) 

Let oj; = and 1, respectively, we have 
Wt(wi,-" >0, ••• ,w n ) =F(wi,-- - ,0, ••• 

+ /3 ^ Pr(M,f)Wt + i(Pf a ,*(fc + l,7V),Q M ^ 1 , m ,QM£,i ); 

W t {u\, ■■■ , 1, • • • =P(^1, • • • , 1, • • • 

+ /3(l-e) ^ Pr(M,£)W t+1 (Pf 1 , Pll Mk + l,N),Q MA \Q M ' £ ' 1 ) 

To prove the lemma in this case, it is sufficient to show 

[!_(!_ e )a,,]WH-i(Pfi,*(fc + 1, AaQ^V^OXQ-^' 1 ) 

= (l-wOW^i(Pf lJ *(fc + l,JV),Q M£ ' 1 ,poi,Q M£ ' 1 ) 

+ e^Wt+i (Pf x , *(Ar + 1, AT), Q^'*' 1 , pn, Q^' 1 ) (13) 

According to induction result, we have 

W t+1 (Pf 1 ,*(A; + l,iV),Q M ' £ ' 1 ,r( ¥? (a;0),Q >( ' £ ' 1 ) 

= r(^(^))^m(Pfi, *(* + 1, AO, Q M£, \ 1, Q MA1 ) 

+ (f - r( ¥? (a;0))W t+1 (Pf 1 , + 1, AT), Q M ' £ '\ 0, Q^' 1 ) (14) 

Wt+^Pfi^^ + l,^),^' 1 ^^,^' 1 ) 

= poiWi+i(Pfi, *(* + 1, AT), Q M,£ '\ 1, Q^- 1 ) 

+ (1 - P01 )W t+1 (Pf ! , + 1, AT), Q M ' £, \ 0, Q^' 1 ) (15) 



/3e Pr(M,S)W t+1 (P( 1 ,^(k + l,N),Q MA \pii,Q J 
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W t+1 (Pf ! , *(A + 1, AT) , Q M ' £,J , Pll , Q-^' 1 ) 

= pnWi+iCPfi, *(* + 1, AT), Q M,£ '\ 1, Q^' 1 ) 

+ (l-p 11 )W t+1 (P £ 11 ,^(k + l,N),Q M ' £ \o,Q M ' £ ' 1 ) (16) 

Combing d), O, (O, we have (fBT) . 
Combing the above analysis in two cases, we thus prove Lemma [4] 

Appendix B 
Proof of Lemma [5] 

We proceed the proof by backward induction. Firstly, it is easy to verify that the lemma holds for slot 

T. 

Assume that the lemma holds from slots t + 1, • • ■ , T, we now prove that it also holds for slot t by 
distinguishing the following two cases. 

• Case 1: channel / is not sensed in slot t, i.e., I > k + 1. In this case, the immediate reward is 
unrelated to coi and oj[. Moreover, let 0(i + 1) and Q'(t + 1) denote the belief vector generated by 
Q(t) = (u?i, • • • , oji, ■ ■ • , ljn) and Q'(t) = • • • , Wj, • • ■ , wjv), respectively, it can be noticed that 
Q(t + 1) and Q'(t + 1) differ in only one element: u[{t + 1) > uj\{t + 1). By induction, it holds that 
W t+ i(n'(t + 1)) > %(fl(i + 1))- Noticing ©, it follows that ^(fi^t)) > Wt(0(i)). 

• Case 2: channel Z is sensed in slot t, i.e., I < k. Following Lemma |4] and after some straightforward 
algebraic operations, we have 

(uj'i - u)i)[W t (wi, ■■■ , 1, ••■ ,u N ) - W t (ui, ■■■ ,0, • • • , uj n )\. 

Let M = M{k) \ {1} = {1, ■ ■ ■ , I - 1, 1 + 1, • ■ ■ , fc}, by developing W t (fi(i)) as a function of wj, 
we have 

W t (n(t)) = F(u 1 (t),--- ,u> k (t)) + 0(1 - e)u t £ Pr(M,f)W t+ i(0 £ (t + l)) 

+ /3[l-(l-eH £ Pr(M,f)Wt + i(fi £ (t + l)). 
£C.M 

Let £jj = and 1, respectively, we have 

W t (wi,--- ,0,--- ,oj n ) = F(wi,--- ,(),--■ ,w n ) + /3 £ Pr(^,5)W m (nf(t + l)), 
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W t (uJi,--- ,0 = ,1,--- ,w„) + /3(l-e) £ Pr(M, £ )W t+ i(fif_ e (i + 1)) 

+ /3e J] Pr(M,£)W t+ i(nf(t + l)), 
£C.M 

where 

nf(t + l) = (Pfi^Cfc + l^.Q^.poi.Q^ 1 ), 

nf_ e (t + i) = (Pf lJ pii J *(fc + i,JV) > ^ M£,1 ,Q M,£ ' 1 ) J 

Of(t + l) = (Pfi.^Cfc + l^.Q^.pu.Q^ 1 ). 

It can be checked that tt(_ e (t + 1) > + 1) and ftf (i + 1) > + 1). It then follows 

from induction that given £, W t+ i(flf_ e (t + 1)) > W t+ i(^{t + 1)) and W t+ i(nf_ e (t + 1)) > 
W t+ i(nf (t + 1)). Noticing that F is increasing, we then have 

W t (ui, ■■■ , 1, ••■ ,w n ) - • • • ,0, • • • ,w n ) = • • • , 1, • • • ,ui n ) - F(ui, ■■■ ,0, • • • ,u n ) 

+ /3(l-e) Pr(M,S)[W t+1 (nf_ e (t + l))-W t+1 (n £ (t + l))) 

£CM 

+ /3e£ Pr(M,£)[W m (nf(t + l))- W t+ i(ng(t + l))] > 0. 
£C.M 

Combining the above analysis in two cases completes our proof. 

Appendix C 
Proof of Lemma[6]and Lemma|7] 

Due to the dependency between the two lemmas, we prove them together by backward induction. 
We first show that Lemma |6] and Lemma |7] hold for slot T. It is easy to verify that Lemma [6] 
holds. 

We then prove Lemma [7] Noticing that poi < < < Pn < L we have 

Wt(wi, • • • ,wat) - Wt(wn,ui, ■ ■ ■ ,un-i) = F{ui, •■• ,u k ) - F{u N ,u\, ■ ■ ■ ,u k -i) 

= (u k - un)[F(ui, ■ ■ ■ ,u)k-i, 1) - F(u)t, ■ ■ ■ ,Uk-i,Q)] < (1 - uj N )A max , 
Wt{u\, ■ ■ ■ ,oj n ) - W t (ujn,u 2 , ■ ■ ■ ,wjv-i,wi) = F(ut, ■■■ ,u k )- F(uj n ,uj 2 , ■ • ■ ,Wfc-i) 

= (fJi - L} N )[F(l,U 2 , ■ ■ ■ ,0Jk) -F(0,U) 2 ,--- ,Uk)] < (Pll -P0l)Amax- 

Lemma [7] thus holds for slot T. 

Assume that Lemma [6] and Lemma |7] hold for slots T, ■ ■ ■ , t + 1, we now prove that it holds for 
slot t. 
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We first prove Lemma |6j We distinguish the following three cases considering I < m: 

• Case 1: I > k + 1. In this case, we have 

W t (uji, ■■■ ,u h --- ,oj m , ■ ■ ■ ,uj n ) - W t (oJi, ■ ■ ■ ,(jJ m , ■■■ ,un) 

= (ui - (jj m )[Wt(u)i, ■■■ , 1, • ' ' ,0, • • • ,uj n ) - W t (u)i, ■■■ ,0, • • • , 1, • • • ,ui N )] 

= (u Jl -u m )p Y, Pr(M(k),£)[W t+1 (n £ (t + l))-W t+1 (n' £ (t + l))}, 

SCM(k) 

where 

n E (t + l) = (Pf 1 ,r(w fc+1 ),--- ,p ll5 ... ,poi,--- ,r(u; N ),Q^ ik) ' £ ), 
n' g (t + l) = (Pf 1 ,r(w fc+1 ),-- - >P01 ,... ,r( Wjv ),Q A/ '( k ^). 

It follows from the induction result that Wt+i^g^ + 1)) > W t +i(tt £ (t + 1)). Hence 

W t (u)i,--- ,u m , ■■■ ,u>n) > W t (cou--- ,w m , - - ,^jv). 

• Case 2: I < k and m > + 1. In this case, denote M. = N{k) \ {/}, it can be noted that 
qM,£ = qM,E,\ + q^,£ ,1 In tWs case> we haye 

W t {u\, ■■■ ,oj h --- ,(jj m , ■ ■ ■ ,u N ) - W t (oJi, ■ ■ ■ ,u m , ■■■ ,">i, ■ ■ ■ ,un) 

= (ui - uj m )[W t (uJi, ■■■ , 1, • • • ,0, • • • , uj n ) - W t {ui, ■■■ ,0, • • • , 1, - • ■ ,u> N )] 
= (ui - w m )[F(a;i, • • • , 1, • • • ,uj k ) - Ffa, ■■■ ,0, • • • ,uj k )+ 

P J2 Pr(M,£)[(l-e)W t+l (P £ 11 ,p ll ,r(Lo k+1 ) r -- ,p 01 ,--- ,t(gj n ),Q 



% 1 (Pf 1 ,rH +1 ),^ >Pll) ... ,r(^),Q^' 1 , m ,Q" V1 ' t ' 1 )] 
> (^-w m )[A mm + /3 £ J Pr(Al,^)-[(l-e)m +1 (poi,Pfi,Pii,T(^+i),-- - , r(wj V ), Q" M ' £ )+ 

eW t+ i(poi,Pfi, T(w fc+ i), • • • ,t(wjv), Q^' £ ,Pii)- 
W t+ i(Pf 1 ,pn,r(a; fe+ i), • • • ,t(u n ), Q M ' £ ,Poi)] 



> (UJ[ - (jj m ) 



A min -/3 Y Pr(M,£)- 



(1 - e)(l - poi)A ma x + e(pn ~ Poi)A 



l-/3(l-e)(pn- m ) 
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> (un-u> m ) Pr(M,£)- 

^min e)(l -p 01 )A max + e(pn -p i)A r 



>0, 



1 - (1 - e)(p u -poi) j 

where the first inequality follows the induction result of Lemma [6) the second inequality follows 
the induction result of Lemma [7] the third inequality follows the condition in the lemma. 

• Case 3: l,m > k. This case follows Lemma [3] 

Lemma 6 is thus proven for slot t. 

We then proceed to prove Lemma |7] We start with the first inequality. We develop W t w.r.t. uj k and 
ujn according to Lemma @] as follows: 

= u k uj ri [Wt{uJi, ■ ■ ■ • • • ,w„-i, 1) - Wt(l,wi, • • • l,wjfc+i, • • • ,u n -\)\ 

- U n )\Wt{ui, ■ ■ ■ ,W n _i,0) - W t (0, Ui, ■ ■ ■ , CJ fc _i , 1, UJ k+1 , • • • , W n _i)] 

+ (1 -Wjt)a; n [Wt(a;i,--- 0, w fc +i, • • • ,w n _i,l) - Wt(l,u;i, • • ■ ,w fc _i,0, • • • ,u n -i)] 

+ (1 -w n )[W t (a;i,--- ,w fc _i,0,w fc +i, ■ ■ ■ ,w n _i,0) - W^O,^, • • ■ , w fc _i,0,a; fc+ i, • • • ,w n _i)] 

(17) 

We proceed the proof by upbounding the four terms in (TTTt . 
For the first term, we have 

W t (ui,--- 1, wfe+i, • • • ,w n _i, 1) - W t (l,ui, ■ ■ ■ ,w fe _i, l,w fc +i, • • • ,w„-i) 

= /3 ^ Pr(Ar(fc-l)^)-[(l-e)W m (Pf 1 ,p 11 ,*(A ; + l,iV-l), m ,Q^( k - 1 ). £ ) 

£CjV(fc-l) 

+ eWn-iCPfi, *(* + 1, JV - l),Pn, Q^ (k-1),f ,Pn) 
- (1 - e)W m (pii,Pfi,Pn, *(* + 1, AT - 1), Q^-D^) 

- eWt+iCPfx.pn, + 1,N- l),pu, Q^^- 1 )^ )] < 

where, the inequality follows the induction of Lemma [6] 
For the second term, we have 

W t (u)i, ■ ■ ■ ,Wfc_i, • • • ,w n _i,0) - Wt(0,wi, • • • , Wfc-i, • • • ,w n -i) 

= F(wi, • • • 1) - F(0,wi, • • • 
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ecAf(k-i) 

+eW* + i(Pf 1 , + 1, JV - l),poi, Q^^-^.pii) - Wt+ifPf^pn, *(* + 1, iV - l), m , Q^-D^)] 

= F(wi,... M-.^y-F&uu--- ,w fc _i) + /9 £ p KA^(fe-l),«?)- 

£C/V(fc-1) 

[eWt+iCPfi^ + ^JV-l^^Q^^^ 

following the induction of Lemma [6] 
For the third term, we have 

W t (uJi, ■ ■ ■ ,u)k-i,0,Wk+i, ■ ■ ■ ,w„-i, 1) - Wt(l,wi, • • • ,Wfc_i,0,w fc+ i, • • • ,w„_i) 
= F(cjt, ■ ■ • ,w fc _i,0) - • • • ,u)k-i)+ 

P PK^-l),f)[W t+ i(Pf 1 ,*(A ; + l,iV-l),p 1 i,Q^ k - 1 )' £ ,poi)- 

SCJV(fc-l) 

(1 - eJWt+iCpn, Pfx.poi, *(* + 1,^-1), Q^" 1 ^) - eWi+iCPf^poi, + 1, JV - l),pn, Q^- 1 ^)] 
< -A mm + /3 ^ Pr(J\T(fc (Pf ljP ii,*(fc + 1,^-1), Q^^-^.poi)- 

(1 - e)W t+1 (p i,Pii, Pfi, *(A: + 1, JV - 1), Q^ 1 )^) - eW t+1 (p i,Pfi ; + 1,^-1), Q^ (k " 1)lf ,Pn)] 

l-[/3(l-e)( Pll -Poi)] T - n 



< Pr(Af(k - l),£) 

SCAf(k-l) 



(1 - e)(l -poi)^max + e(pn -poi)A 
(1 - e)(l - p i)A maa; + e(pn - poi)A 

7 



l-/3(l-e)( m - m ) 
1 



1 - (1 - e)(pu -p i)_ 

where the first inequality follows the induction result of Lemma [6l the second equality follows the 
induction result of Lemma [7J the forth inequality is due the condition in Lemma [7J 
For the fourth term, we have 

W t (wi,--- ,w fc _i,0,a; fc+1 ,-- - ,o%_i,0) -W t (Q,ui,--- ,u k -i,0,u k+ i, ■ ■ ■ ,w n _i) 

= /3 £ Pr(AA(fc-l),5)[^ +1 (Pf 1 ,*(A ; + l,JV-l), m ,Q Ar{k " 1) ^,Poi) 

fCjV(fc-i) 

-^(Pf^poi, + 1, JV - 1), Q^-^poi)] 

= /3 J] Pr(AA(A ; -l),5)[^ +1 (Pf 1 ,*(A ; + l,JV-l),poi,Q AA{k - 1) ^,Poi) 
ecV(fc-i) 

-W i+1 ( P0 i, Pfi, *(*: + 1, JV - 1), Q^ (k - 1),£ ,poi)] 



< 
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<P ^(^-l),0[W / m(Pfi,*(^ + l,A r -l),Q A/ ' (k " 1) ' f ,Poi,Poi) 

-W t+ i(poi,Hi, + 1,N- 1), Q^-^poi)] 

< (1 -Pm)P^max 

where, the second equality follows Lemma [3] the first inequality follows the induction result of Lemma [6] 
and the second inequality follows the induction result of Lemma |7J 
Combing the above results of the four terms, we have 

W t (ui, ■ • • ,u N ) - W t (u n) ui, ■ • • ,u) N -i) 

< cj fc (l - uj n ) ■ A max + (1 - w fc )(l - uj n ) ■ (1 - poi)/3A max 

< - WAr)A max + (1 — - WAr)A maa , < (1 — OJ7v)A ma3; , 

which completes the proof of the first part of Lemma [7] 

Finally, we prove the second part of Lemma |7] To this end, denote M. = {2, • • • , k}, we have 

W t (oJi, ■ ■ ■ ,uj n ) - W t (u} N ,u} 2 , ■ ■ ■ ,Wjv-i,o;i) 

= (Ui - U) N )[W t (l,U)2, ■ ■ ■ ,U3N-1,0) ~ Wt(0,UJ 2 , ■ ■ ■ ,WN-1, 1)] 

= (wi-wjvJfFCl,^,-- - ,u k )-F{0,u 2 ,--- ,u k )+P Pr(M,S)- 

[(1 - 6)^+1 (Pf^pii, + 1, iV - l),p i, Q M,£ ) + eWmCPfi. + l,iV - l),Poi,Pii, Q M£ ) 
-W t+ i(Pf 1 ,*(A; + l ) iV-l),p 11 ,poi,Q M£ )] 

< - uj N )(A max + (3 £ Pr(A4,£)[(l-e)W t+ i(Pf 1 ,pii,*(fc + l.JV-lJ.poi.Q^ 5 ) 

£C.M 

+e^ t+1 (Pf 1 , + 1, TV - l),poi,Pii, Q M£ ) - ^+i(Pfi, *(* + 1, JV - l),p i,Pii, Q^)]) 
= (wi - u N )(A max Pr(M,€)[(l - e)W t +i(Pf 1 ,pii, *(* + 1, JV - l),p i, Q M£ ) 

£C.M 

-(1 - e)W m (Pf l9 *(A + 1, iV - l),poi,pii, Q^' 5 ]) 

< ( Wl -^)(A max + /3 £ Pr(^,,?)[(l- e )W t+ i(Pf 1 ,p 1 i,*(A ; + l,iV-l),Q^ £ ,poi)- 

£C.M 

(1 - eJWt+iCpoi, Pf x , *(* + 1, JV - 1), Q M ' £ ,pn)]) 



< (Pu -Poi) 



l-[/3(l-e)(pii-Poi)] T - m fa v A 
" l-/3(l- £ )(pn-Poi) (Pn -^ 0l)A — 
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where the first two inequalities follows the recursive application of the induction result of Lemma |6l the 
third inequality follows the induction result of Lemma [7] 

We thus complete the whole process of proving Lemma [6] and Lemma [7] 
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